Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets

نویسندگان

  • Kuan Cheng
  • Xin Li
  • Ke Wu
چکیده

Synchronization strings are recently introduced by Haeupler and Shahrasb [11] in the study of codes for correcting insertion and deletion errors. A synchronization string is an encoding of the indices of the symbols in a string, and together with an appropriate decoding algorithm it can transform insertion and deletion errors into standard symbol erasures and corruptions. This reduces the problem of constructing codes for insertion and deletion errors to the problem of constructing standard error correcting codes, which is much better understood. Besides this, synchronization strings are also useful in other applications such as synchronization sequences and interactive coding schemes. Amazingly, Haeupler and Shahrasb [11] showed that for any error parameter ε > 0, synchronization strings of arbitrary length exist over an alphabet whose size depends only on ε. Specifically, [11] obtained an alphabet size of O(ε−4), as well as a randomized algorithm to construct such a synchronization string with length n in expected time O(n). However, it remains an interesting question to find deterministic and more efficient constructions. In this paper, we improve the construction in [11] in three aspects: we achieve a smaller alphabet size, a deterministic construction, and a faster algorithm to construct synchronization strings. Along the way we introduce a new combinatorial object, and establish a new connection between synchronization strings and codes for insertion and deletion errors — such codes can be used in a simple way to construct synchronization strings. This new connection complements the connection found in [11], and may be of independent interest. ∗[email protected]. Department of Computer Science, Johns Hopkins University. Supported by NSF Grant CCF-1617713. †[email protected]. Department of Computer Science, Johns Hopkins University. Supported by NSF Grant CCF-1617713. ‡[email protected]. Department of Computer Science, Johns Hopkins University. 1 ar X iv :1 71 0. 07 35 6v 1 [ cs .I T ] 1 9 O ct 2 01 7

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication problems: • We give a deterministic, linear time synchronization string construction, improving over an O(n) time randomized construction. Independently of this work, a deterministic O(n log log n) time construction was ...

متن کامل

Efficient discovery of common patterns in sequences over large alphabets

We consider the problem of identifying motifs, recurring or conserved patterns, in the data modeled as strings or sequences. In particular, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. The proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well wit...

متن کامل

Towards Regular Languages over Infinite Alphabets

Motivated by formal models recently proposed in the context of XML, we study automata and logics on strings over infinite alphabets. These are conservative extensions of classical automata and logics defining the regular languages on finite alphabets. Specifically, we consider register and pebble automata, and extensions of first-order logic and monadic second-order logic. For each type of auto...

متن کامل

Efficient Computation of Gapped Substring Kernels on Large Alphabets

We present a sparse dynamic programming algorithm that, given two strings s and t, a gap penalty λ, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(p|M| log |t|), where M = {(i, j)|si = t j} is the set of matches of characters in the two sequences. The algorithm is easily adapted to handle bounded length subsequences and diffe...

متن کامل

Efficient computation of gap-weighted string kernels on large alphabets

We present a sparse dynamic programming algorithm that, given two strings s, t, a gap penalty λ, and an integer p, computes the value of the gap-weighted length-p subsequences kernel. The algorithm works in time O(p|M | log min(|s|, |t|)), where M = {(i, j)|si = tj} is the set of matches of characters in the two sequences. The new algorithm is empirically evaluated against a full dynamic progra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.07356  شماره 

صفحات  -

تاریخ انتشار 2017